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1 Introduction 

With the current and future availability of an increasing 
number of remote sensing instruments, the problem of 
storage and transmission of large volumes ot data has be- 
come a significant and pressing concern. For example, the 
High-Resolution Imaging Spectrometer will acquire data at 
30 meter resolution in 192 spectral bands. This translates to a 
data rate of 280 Mbps! The Spaceborne Imaging Radar - C 
(SIR-C) will generate data at the rate of 45Mbps per channel 
with four high data rate channels [1]. To accomodate this ex- 
plosion of data there is a critical need tor data compression. 
One can view the utility of data compression in two different 
ways. If the rate at which data is being generated exceeds the 
transmission resources, one can use data compression to 
reduce the amount of data to fit available capacity. Or given 
some fixed capacity, data compression permits the gathering 
of more information than could otherwise be accomodated. 

In this paper, we provide a survey of current data com- 
pression techniques which are being used to reduce the 
amount of data in remote sensing applications. The survey 
aspect of this paper is far from complete, reflecting the sub- 
stantial activity in this area. The purpose of the survey is 
more to exemplify the different approaches being taken 
rather than to provide an exhaustive list of the various 
proposed approaches. For more information on compression 
techniques the reader is referred to [2, 3, 4], 

Compression techniques in remote sensing applications 
can be broadly classified into three (non distinct) categories. 
These are 

1. Classification/Clustering 

2. Lossless Compression 

3. Lossy Compression 

The rationale behind the classification approaches is that 
in a given dataset, the end user is generally interested in par- 
ticular features in the data. The ‘dimensionality of these fea- 
tures is generally substantially less than the dimensionality 
of the data itself. Thus, rather than transmitting the data in its 
entirety, if the features are extracted on-board and trans- 
mitted this can result in a significant amount of compression. 
Lossless compression techniques provide compression 
without any loss of information. That is, the raw data can be 
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tails the discarding of some of the information, can be used. 

The utility of this approach is closely related to the amount 
of distortion incurred and the importance of fidelity in the 
particular application. The classification approaches can be 
viewed as a form of lossy compression. The three ap- 
proaches are not mutually exclusive. For example, one may 
use classification as the first step with the feature vectors 
being losslessly encoded. 

2 Classification 

If we assume an image to be composed of a small number 
of objects, then the most efficient form of data compression 
is to assign each pixel in the image to one of the objects, and 
then simply transmit the object labels to the ground. This 
idea is behind several high compression schemes which at- 
tempt to classify the pixels based on different features, and 
then transmit the classification map. 

A technique called BLOB was introduced by Kauth et. al. 
[5] which uses proximity information along with spectral in- 
formation for unsupervised clustering. The use of proximity 
information allows for greater ease in the classification of 
boundary pixel values, which otherwise could be classified 
to a set different from the adjacent regions. BLOB would be 
most useful in situations where objects have relatively well 
defined boundaries. 

Another object oriented unsupervised classification 
scheme is described in [6]. They use what they call the path 
hypothesis for object classification. The path hypothesis as- 
sumes spatial contiguity, and spectral nearness for different 
pixels belonging to the same object. The spectral features of 
the different objects are then extracted and used to classify 
the object. They report an increase in classification accuracy 
along with a decrease in the amount of data required. 

Hilbert [7] proposed a more general clustering algorithm. 
He proposed dividing the data into blocks, and then cluster- 
ing them using an unsupervised procedure. The cluster 
centroids were then transmitted, along with a feature map 
describing the cluster to which each block belonged. This ap- 
proach does not depend on the existence of well defined 
boundaries. Hilberts technique is a precursor to current day 
Vector Quantization algorithms which are discussed later. 

A common precursor to classification is the transforma- 
tion of the data using the Karhunen-Loeve Transform. The 
Karhunen-Loeve transform is used to linearly transform data 
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into uncorrelated coordinates. This then makes the classifica- 
tion task easier, as the coordinates can be clustered in a 
multi-dimensional space, and then classified based on their 
location in this space. The rows of the KL transform matrix 
are the eigenvectors of the correlation matrix of the data. 
These vectors will often be related to physical parameters. 
For example in [8] the first and second eigenvectors cor- 
respond to the response of the dominant surface covers. 

Chen and Landgrebe [8] also show that it is sufficient to 
send only clipped (hard limited to + - 1) eigen functions 
along with only a fraction of the coefficients to obtain sig- 
nificant classification accuracy. They therefore propose the 
use of this scheme aboard the HIRIS instrument. 


3 Lossless Compression 

Lossless compression, as the name implies, consists of 
reduction in the amount of data without sacrificing the 
fidelity of the data. The earliest known lossless compression 
technique of the technological age is probably the Morse 
code. In the Morse code, letters that occur often such as E 
are coded using short symbols, while letters that occur rela- 
tively infrequently such as Z, are represented by long sym- 
bols ( a single dot for E and dash dash dot dot for Z). This 
idea (albeit in more sophisticated form) is at the heart of 
most lossless compression schemes. In 1948 Claude Shan- 
non defined the amount of information contained in the 


event X as log a [9], where P(X) is the probability of 
P(a) 

the event X and a is the base of the logarithm. If a = 2 the 
unit of information is bits. If we define X t1 to be the sequence 

of observations (Xq, X\ X tl .\ ), then the entropy of the 

source generating the sequence is defined as 


H(S) = lim G„ 

;r— *oo 


where 


to less probable symbols a la Morse. Another technique 
which operates on sequences rather than individual letters is 
Arithmetic Coding . The Arithmetic coding algorithm guaran- 

2 

tees an average coding rate R where H(S) <R< H(S) + -, n 

n 

being the length of the sequence. If the statistics of the se- 
quence change with time, these techniques will suffer some 
degradation. To combat this several adaptive coding techni- 
ques have been proposed including dynamic Huffman 
coding [11], adaptive arithmetic coding [12] and the Rice al- 
gorithm [13]. The Rice algorithm has been shown to be op- 
timal under some widely available conditions [14], and has 
been implemented in a VLSI chip which can process 20 M- 
Bytes per second [15]. 

If the observations are not independent then the code 
designed using the first order probabilities P{X t ) is only 
guaranteed to be within one bit of G \ which may be substan- 
tially greater than H(S). Because of this fact lossless com- 
pression consists of two steps; decorrelation, and coding. 

The first step can be seen as an ‘entropy reduction 1 step in 
which the redundancy or correlation of the data is removed 
(reduced). This results in another sequence which has a first 
order entropy G ( which can be significantly lower than the 
first order entropy of the original sequence. Now if a vari- 
able length code is designed using the first order prob- 
abilities of the decorrelated data, this will result in a lower 
rate/higher compression. Consider for example the following 
sequence 


12345432123212345432345 

estimating the first order probabilities from the sequence we 
obtain 


mi = P[5] = ^ ; P[ 2] = P[ 3] = £ ; P[ 4] = ^ 


g„ = y^p{)C)io S2 


1 

P{X n ) 


Shannon [9] showed that the minimum average rate at 
which the output of the source S can be encoded is H(S) 
bits/symbol. If the source outputs {X, } are independent then 
the expression for entropy reduces to 


H(S) = G t ^P(X,) log 2 ~~ 

Given a sequence of independent observations, Huffman 
[10] developed an algorithm which provides a variable 
length code which gives an average coding rate R, where 
H(S) <R< H(S) + 1. The algorithm assigns shorter 
codewords to more probable symbols and longer codewords 


which gives a value for G\ of 2.25 bits/sample. It is obvious 
from looking at the data that it possesses some definite struc- 
ture. Some of this structure can be removed by storing con- 
secutive differences. The original data can be reconstructed 
(without loss) by simple addition. The difference data is 

11111 - 1 - 1 - 1 - 111 - 1 - 11111 - 1 - 1-1111 

The difference can be represented using a binary al- 
phabet, so the coding rate can immediately be lowered to 
one bit/sample. To see what the value of G\ is we first com- 

14 9 

pute the first order probabilities as P[l] =—, P[-l] = — 

which gives an entropy of .96 bits/sample. In this particular 
case the gain of 0.04 bits per sample may not be worth the 
additional complexity required for a variable length code. 
Notice that in this case the compression was obtained mainly 
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due to the decorrelation step. Because ot this, research in 
lossless compression is focusing more and more on the 
development of better decorrelation algorithms. An idea of 
how much decorrelation gain is available can be obtained by 

looking at the conditional entropy. 

In the example given above, the data was one dimen- 
sional so the prediction used to generate the difference or 
residual data was also one dimensional. In the case of 
remotely sensed images, the data is generally three dimen- 
sional: two spatial dimensions and a spectral dimension. In 
these cases, it would seem reasonable to use prediction based 
on all three dimensions. Chen et. al. [161 compute the 
theoretical advantages to be gained from using prediction 
based on all three dimensions. They show that while there is 
some advantage to be gained from using more than one 
dimensional prediction, the increase in compression is small. 
However, if the increase in complexity of going from one to 
two or three dimensions is acceptable (and it can be argued 
that the increase in complexity is minimal), it would seem 
reasonable to use multi-dimensional prediction to decorrelate 
the data. 

A somewhat different approach is adopted by Memon et. 
al. [17. 18]. They reason that in an image the correlations 
may be maximum in the vertical, horizontal or diagonal 
direction depending on the object being imaged. Therefore, 
one should use whichever pixel gives the most decorrelation 
for prediction. They therefore develop the concept of predic- 
tion (or scanning) trees for performing the decorrelation. The 
drawback with this approach when coding single images is 
that the cost of encoding the prediction tree may eat up any 
savinss due to better decorrelation. In the case of multi- 
spectral images, because the same prediction tree can be 
used to code a large number of bands, the relative cost of en- 
coding the prediction tree is small enough not to overwhelm 
the savings obtained via this approach [19]. 

In all that we have discussed above, we have taken a 
rather aeneral view ot the lossless compression problem. 
When faced with a specific problem, one can often come up 
with a simpler more efficient solution. Consider the problem 
of encoding the output of a spectrometer. A general algorithm 
such as the Rice algorithm will do a nice job of encoding the 
output of the spectrometer. However, given the very special 
structure of the data (the data looks like a noisy decaying ex- 
ponential) one can come up with simpler techniques as in 
[20] which are simpler and give better performance. Similar- 
ly Steams et. al. [21] develop a lossless compression scheme 
tuned to the peculiarities of seismic data. When using ap- 
plication specific algorithms, the user should be aware of the 
fact that if the data sequence deviates from the assumed 
structure, this may result in performance loss. 

Finally, lossless coding can be used in conjunction with 
other techniques. Several schemes in the literature use loss- 
less compression as the second stage, where the first stage is 
feature extraction or lossy compression [23, 23, 24]. 


4 Lossy Compression 

In many applications, loss of information which is not per- 
ceptually significant can be easily tolerated. In fact in certain 
cases, such as processing of SAR data [25], the information 
lost may actually be the noise. In these cases, it makes sense 
to use lossy compression techniques which provide much 
higher compression than the lossless techniques. However, 
before we extoll the virtues of various lossy compression 
techniques, one should keep in mind the importance of caie- 
fully picking the distortion measure. Most of the compres- 
sion schemes described here use the mean squared error (or 
some variant) as the distortion measure. The mean squared 
error is defined as 

jv 

1=1 

where .v, is the original data value and x, is the reconstructed 
(compressed and then decompressed) value. Note that this is 
an average measure therefore it will spread out the error ef- 
fects at any one location. Under this measure, a large erroi in 
one sample value with no or little error in the other N- 1 
sample values may be equivalent to small errors in all N 
sample values. If the application requires that each sample 
value be represented within some tolerance, then the MSE is 
probably not the distortion measure that should be used. 

4 . 1 Quantization 

The heart (and sometimes the totality) of most lossy com- 
pression schemes is the quantization process. Quantization is 
a many to one mapping from a possibly infinite set to a finite 
set. The input to the quantizer can be a scalar, in which case 
the quantizer is called a scalar quantizer, or a vector in which 
case the quantizer is called a vector quantizer (VQ). The 
scalar quantizer is simply a concatenation of an A/D and a 
D/A. A simple A/D is shown in Figure 1. Assuming A = 1, in 
this A/D if the input falls in the range (0,1], the output is the 
codeword 10, if the input falls in the range (-~,-l] the out- 
put is the codeword 00, and so on. The D/A takes the 
codeword produced by the A/D and generates a real value 
corresponding to the interval represented by the codeword. 

In our simple example if the codeword 00 is received the 
D/A will put out a value of -1.5. The input/output map for 
this quantizer (A/D-D/A combination) is shown in Figure 2. 
Figures 1 and 2 describe a two bit uniform quantizer. If the 
stepsize A is not constant for the different intervals, the quan- 
tizer is called a non-uniform quantizer. Given information 
about the statistics of the input signal, Max [26] and Lloyd 
[27] have developed algorithms for the design of optimum 
uniform and non-uniform quantizers for memoryless sour- 
ces. Kwok and Johnson [28] use a two bit quantizer designed 
for Gaussian data to code SAR data from the Magellan mis- 
sion. The SAR data is originally at 8-bit resolution, so the 
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Figure I. A fwo bit A/D 




Figure 3 . DPCM structure 


compression ratio is 4: 1. To accomodate the rather large 
dynamic range of the SAR data, the quantizer is adapted on 
a block by block basis, using the average signal magnitude. 
The signal magnitudes in a block are used to compute a 
threshold value which is used in place of A in Figure 1. The 
output of the D/A are the optimum values for a Gaussian 
input with variance of one multiplied by the computed 
threshold. 

The SIR-C [1] uses 8 bit uniform quantization followed 
by a feature which allow s it to reduce the number of bits per 
sample to facilitate the acquisition of more samples. Data 
compression thus allows the acquisition of more data at the 
cost of reduced resolution. 

In some cases it might be more efficient to quantize some 
function of the data rather than the data itself. Dubois et. al. 
[29] compress the output of an imaging radar polarimeter by 1 
first obtaining the Stokes matrix from the scattering 
matrices. Four Stokes matrices from contiguous pixels are 
added to form one four-look Stokes matrix. The elements of 
the four-look Stokes matrix are then quantized. The ad- 
vantage to this approach is that the elements of the Stokes 
matrix have certain well defined properties which can be 
used in the quantization process of the Stokes matrix. 

4.2 DPCM 

The relationship between the variance of the input to the 
quantizer and the MSE can be given by the following 
relationship, 
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Figure 4. DPCM coding of a one dimensional edge 


mse = z ; 2 "V; 

where e* depends on the input probability density function, 
and R is the number of bits/sample. As can be seen from this 
expression, the MSE is proportional to the input signal 
variance. Therefore, if we could reduce the input signal 
variance this would lead to a reduction in the MSE. (It 
should be noted that the operations to remove the redundan- 
cy could also change the input pdf which may diminish the 
benefits of a reduced variance.) This is the motivation for a 
class of lossy compression schemes known as Differential 
Pulse Code Modulation (DPCM) schemes. DPCM schemes 
remove redundancy in the source sequence by using the cor- 
relation in the source sequence to predict ahead. The 
predicted value is removed from the signal at the transmitter 
and reintroduced at the receiver. The prediction error, which 
has a smaller variance than the input signal is then quantized 
and transmitted to the the receiver. A block diagram of a 
DPCM system is shown in Figure 3. This technique is used 
in the coding of the SPOT satellite’s panchromatic band. 

While DPCM coding performs well in quasi-stationary 
regions of an image, it does a poor job in edge regions. The 
reason for this is that the prediction in DPCM uses the pre- 
vious reconstructed pixels. In an edge region, the prediction 
error is quite large. Therefore, the input to the quantizer 
lands in one of the outer regions ((-°°,-l],[l »°°) in our ex_ 
ample). The quantization error can therefore be quite large. 
This is fed back via the prediction process into the coding of 
the next pixel, and so on causing a smearing of the edges. 
This process is demonstrated on a one-dimensional ‘edge’ in 
Figure 4. This problem can be overcome by using recursive- 
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Figure 5 . Original aerial view of Omaha 


ly indexed quantization [30, 3 1 ] which avoids the large quan- 
tization error problem by operating the quantizer in two dif- 
ferent modes. Whenever the input to the quantizer falls in 
the external regions, the quantizer switches into a recursive 
mode, and the quantization error is requantized until the 
error falls within some predetermined tolerance. This ap- 
proach not only prevents large quantization errors from 
propagating through the coded sequence, it also guarantees 
that the error per pixel will be less than a p re -determined 
value. To show how well this scheme works, we code the 
aerial view of Omaha shown in Figure 5. The compressed 
(and decompressed ) image coded using the DPCM scheme 
described above at a rate of 1 .4 bits per pixel is show n in Fig- 
ure 6. Note that while there is an overall increase in 
‘blurriness* the distortion introduced does not blur the edges. 

While the DPCM structure removes substantial amounts 
of the redundancy from the data stream, it should be remem- 
bered that the prediction process in the DPCM structure is 
linear, and can therefore remove only those redundancies 
which are expressed as linear processes. For example, a slow- 
ly varying sequence 1 23454334567776 has redun- 
dancies that can be modeled by a linear process. However, 
we can easily come up with sequences that have redundan- 
cies that can not be characterized by a linear process such as 
4 24 15 19 4 24 15 19 .... This fact has been used by some to 
improve the data compression by making use of this redun- 
dancy for code selection [32], and by others for providing 
error protection [33]. 

4.3 Vector Quantization 

Until now we have been talking about quantization as a 
scalar process, however, the basic idea ot quantization can 
easily be extended to the vector case. Scalar quantization can 
be viewed as a partition ot the real number line, with the 



Figure 6. DPCM coded Omaha image at 1.4 bpp 


A/D doing the partitioning, and the D/A providing a repre- 
sentative value for each partition. Similarly, vector quantiza- 
tion can be seen as a partitioning of multidimensional space. 
While conceptually the problems of scalar and vector quan- 
tization approaches are very similar, the practical problem of 
designing vector quantizers is significantly more difficult. 

Two somewhat different approaches have been taken 
towards the design of vector quantizers. The first is a cluster- 
ing approach similar to the Hilbert technique [7]. In this ap- 
proach [34], a training sequence is used to identity the 
regions in multi-dimensional space where the data seems to 
cluster. The quantizer outputs are the centroids of these 
clusters, and the partitions are the nearest neighbor partitions 
of these centroids. An example ot a two dimensional vector 
quantizer is shown in Figure 7. The VQ in Figure 7 contains 
4 output levels, or codewords. Thus the size of each 
codeword is two bits. But each output level corresponds to 
the coding of two input samples, therefore, the number of 
bits per sample is one. In general, given the dimension of the 
vector d and the number of bits per sample R , the size of the 
vector codebook is 2^. Notice, that this means an exponen- 
tial increase in the size of the codebook with dimensionality 
and rate. For example, given d- 12 and R - 2, the size of the 
codebook would be 2 24 = 167772161 This represents an enor- 
mous expense in storage and computing resources. Thus the 
rate-dimension product provides a limitation on the clustered 
VQ designs. Fortunately, a lot can be done at low rate-dimen- 
sion products. For more moderate rate-dimension products a 
number of somewhat more structured VQ algorithms have 
been developed [35]. Chang et. al. [25] report the use of a 
tree-structured VQ on Seasat SAR imagery with favorable 
results. As the codebook of the VQ is obtained by training, it 
is important that the data in the training set be representative 
of the data in the test set. If this is not the case, there can be 
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Figure 7. A four level, tw o dimensional vector quantizer 

significant degradation in the data that is not typical ot the 
training set [25J. The Omaha image of Figure 5 is coded 
using a clustered VQ at 0.5 bits per pixel. The result is 
shown in Figure 8. The VQ dimension was 16 (4X4 blocks) 
and this is evident from the coded image in Figure 8. where 
there is a noticeable amount of blockiness. The blocks that 
lie on the edges of objects in the image clearly distort the 
edges. The VQ codebook was obtained using another aerial 
image. We can improve the performance of this algorithm 
by increasing the rate and/or by generating the codebook 
from an image which more closely resembles the image 
being coded. In Figure 9 we have the Omaha image coded at 
l bit per pixel using a codebook generated using the Omaha 
image itself. There is substantial improvement in the quality, 
though there is still some distortion in the lower quarter ot 


Figure 8. VQ coded Omaha image at 0.5 hpp 



the picture. It should be noted that the use of the image to 
generate the codebook is generally not realistic. 

Vector Quantization is also used by Gupta and Gersho for 
the coding of Landsat TM images [36 J. They use a vector 
DPCM system with vector quantization in the spatial 
domain, and predictive encoding in the spectral domain. A 
variation of predictive VQ is also used by Giusto [37] for the 
compression of multispectra! images. 

The rate-dimension product constraint on vector quan- 
tizers can be lifted by making the vector quantizer more and 
more structured. Of course, as the VQ acquires more and 
more structure of its own, it is less and less responsive to 
structure in the data. The most structured vector quantizers 
are those based on a multi-dimensional lattice [38]. While 
these quantizers do an excellent job of quantization, they can- 
not at the same time perform the redundancy removal opera- 
tion performed by the clustered VQs. They therefore have to 
be used in conjunction with other techniques to provide com- 
pression [39, 40). 

4.4 Transform Coding 

Most of the techniques we have talked about operate in 
the data domain, i.e. without any transformation. There is a 
large class of compression techniques that operate on a trans- 
formed version of the data. They are called transform coding 
techniques. The idea behind transform coding is to transform 
the data in such a way as to compact most of the energy (and 
information) into a few coefficients. These coefficients can 
then be coded, while other coefficients can be discarded 
thereby achieving data compression. The most efficient trans- 
form from the compaction point of view is the Karhunen- 
Loeve [2 1 transform. However, the Karhunen-Loeve 
transform is data dependent which makes it impractical for 
most compression applications. The best alternative to the 


Figure 9. VQ coded Omaha image at 1.0 bpp 
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Figure 10. Zig zag ordering 


Karhunen Loeve transform is the Discrete Cosine 
Transform (DCT). This is a real, separable, unitary transform 
that is the basis for an image compression standard [4 1 1. 
Because of its popularity in image compression vatious fast 
algorithms have been proposed tor its implementation [42, 

43]. 

The ESA Huygens Titan Probe to be launched by the Cas- 
sini Orbiter will use the DCT for compressing the image data 
acquired during its descent through Titan s atmosphere. The 
images ot size 256X256 will be divided into 8X8 blocks. 
These blocks will be transformed and the transform coeffi- 
cients reordered using the zigzag ordering shown in Figure 
10. The ordered coefficients will then be blocked into substr- 
ings of four coefficients each. Substrings with all coefficient 
values below a specified threshold will be deleted while the 
remainder will be quantized using scalar quantizers. Details 
can be found in [44. 45]. 


To sec the artifacts introduced by DCT coding wc have 
coded the Omaha image at 0.5 bits per pixel and 1 bit per 
pixel as shown in Figures 1 1 and 12. Note the substantial 
block artifacts in Figure 1 1 which have been reduced to a 
large extent in Figure 12. However even in Figure 12 one 
can see significant distortion in edge regions. 

An adaptive version ot DCT was also considered by 
Chang et. al. [25] for the compression of Seasat SAR im- 
agery. They compare the DCT technique with a VQ techni- 
que and decide in favor ot the VQ technique based on 
complexity issues. With the wide acceptance ot the DCT as 
an image compression standard, the complexity issue may 
no longer be relevant, as more and more manufacturers are 
bringing hardware implementations ot the DCT to the 
market. 

5 Conclusions 

As can be seen from this discussion, there is a substantial 
amount of on-going activity in the area ot data compression 
for remote sensing applications. This will only increase as 
there is more and more need for data compression. However, 
ther are several areas of research which have not been ad- 
dressed in any significant way. 

There is a need for the development of better distortion 
measures which can be then used to develop more sophisti- 
cated compression algorithms. It is possible that rather than a 
single distortion measure, a set of distortion measures will be 
needed for different applications. The development of such 
measures, and algorithms utilizing these measures, lequire 
close cooperation between data compression specialists and 
the scientists and engineers who are the end-users ol the data 
obtained through remote sensing. 

The multi-dimensional (spatial and spectral) nature of the 
data has not really been thoroughly explored (except in the 


Figure //. DCT coded Omaha image at 0.5 bpp. 



Figure 12. DCT coded Omaha image at 1.0 bpp. 
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classification approaches). With the development and 
deployment of high spectral resolution instruments, this par- 
ticular aspect of remotely sensed data will become more im- 
portant. Compression schemes which take advantage of this 
fact need to be developed. An analogy could be drawn with 
the development of compression algorithms for video as op- 
posed to still images. However, the algorithms developed for 
video cannot be directly applied to high spectral resolution 
image data sets, as the differences that occur between frames 
of a video sequence are not the same as the differences that 
occur between different spectral images. It would seem that 
VQ approaches such as [36] would provide possible solu- 
tions. The rate-dimension constraints in clustering VQ could 
be avoided by the use of Lattice VQ techniques. Another ap- 
proach described in [46] is to use a two step strategy, in 
which the first step is used to model the data in the spectral 
direction. The resulting models are then treated as a vector 
image for compression in the spatial directions. Beyond this, 
however, there is a need for the development of three dimen- 
sional approaches, both to model the data, and develop com- 
pression algorithms. 
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